Skip to content

feat(syscalls): reimplement alloc syscalls with stricter requirements#2428

Open
mkroening wants to merge 3 commits into
mainfrom
alloc-syscalls
Open

feat(syscalls): reimplement alloc syscalls with stricter requirements#2428
mkroening wants to merge 3 commits into
mainfrom
alloc-syscalls

Conversation

@mkroening
Copy link
Copy Markdown
Member

@mkroening mkroening commented May 11, 2026

This PR reimplements the alloc syscalls similar to #2426.

Changes:

  • No more trace logging. Alloc system calls can still be traced with feature = "strace" as before.
  • No more argument checking. We now have the same soundness preconditions as GlobalAlloc.
  • No more direct access to ALLOCATOR. We now go through Rust's #[global_allocator] machinery. The generated code is the same.

The wins in performance are small, but since allocations are often in the hot path, this should be good anyway.
If useful, checks and prints could be added back in cfg!(debug_assertions).

Benchmark

Valgrind benchmark using Gungraun.

# Cargo.toml
[package]
name = "alloc-benchmark"
edition = "2024"

[dev-dependencies]
gungraun = "0.18"
log = "0.4"
spinning_top = "0.3"
talc = "5"

[[bench]]
harness = false
name = "gungraun"
path = "benches/gungraun.rs"

[profile.bench]
debug = true
lto = "thin"
codegen-units = 1
use std::alloc::{GlobalAlloc, Layout, alloc, dealloc};
use std::hint::black_box;
use std::ptr;

use gungraun::prelude::*;
use log::{trace, warn};
use talc::TalcLock;
use talc::source::Claim;

#[global_allocator]
static TALC: TalcLock<spinning_top::RawSpinlock, Claim> = TalcLock::new(unsafe {
    static mut INITIAL_HEAP: [u8; 0x1_0000] = [0; _];

    Claim::array(&raw mut INITIAL_HEAP)
});

#[library_benchmark]
#[bench::page(0x1000, 1)]
#[bench::cache_line(128, 128)]
fn bench_new(size: usize, align: usize) {
    let size = black_box(size);
    let align = black_box(align);
    let ptr = unsafe { black_box(alloc_new(size, align)) };
    unsafe {
        dealloc_new(ptr, size, align);
    }
}

#[library_benchmark]
#[bench::page(0x1000, 1)]
#[bench::cache_line(128, 128)]
fn bench_old(size: usize, align: usize) {
    let size = black_box(size);
    let align = black_box(align);
    let ptr = unsafe { black_box(alloc_old(size, align)) };
    unsafe {
        dealloc_old(ptr, size, align);
    }
}

library_benchmark_group!(
    name = bench,
    compare_by_id = true,
    benchmarks = [bench_new, bench_old]
);

main!(library_benchmark_groups = bench);

unsafe fn alloc_new(size: usize, align: usize) -> *mut u8 {
    unsafe { alloc(layout_from_size_align(size, align)) }
}

unsafe fn dealloc_new(ptr: *mut u8, size: usize, align: usize) {
    unsafe {
        dealloc(ptr, layout_from_size_align(size, align));
    }
}

unsafe fn layout_from_size_align(size: usize, align: usize) -> Layout {
    if cfg!(debug_assertions) {
        Layout::from_size_align(size, align).unwrap()
    } else {
        unsafe { Layout::from_size_align_unchecked(size, align) }
    }
}

unsafe fn alloc_old(size: usize, align: usize) -> *mut u8 {
    let layout_res = Layout::from_size_align(size, align);
    if layout_res.is_err() || size == 0 {
        warn!("__sys_alloc called with size {size:#x}, align {align:#x} is an invalid layout!");
        return ptr::null_mut();
    }
    let layout = layout_res.unwrap();
    let ptr = unsafe { TALC.alloc(layout) };

    trace!("__sys_alloc: allocate memory at {ptr:p} (size {size:#x}, align {align:#x})");

    ptr
}

unsafe fn dealloc_old(ptr: *mut u8, size: usize, align: usize) {
    unsafe {
        let layout_res = Layout::from_size_align(size, align);
        if layout_res.is_err() || size == 0 {
            warn!(
                "__sys_dealloc called with size {size:#x}, align {align:#x} is an invalid layout!"
            );
            debug_assert!(layout_res.is_err(), "__sys_dealloc error: Invalid layout");
            debug_assert_ne!(size, 0, "__sys_dealloc error: size cannot be 0");
        } else {
            trace!("sys_free: deallocate memory at {ptr:p} (size {size:#x})");
        }
        let layout = layout_res.unwrap();
        TALC.dealloc(ptr, layout);
    }
}

Results

gungraun::bench::bench_new page:(0x1000, 1)
  Instructions:                         238|238                  (No change)
  L1 Hits:                              322|322                  (No change)
  LL Hits:                                0|0                    (No change)
  RAM Hits:                               4|4                    (No change)
  Total read+write:                     326|326                  (No change)
  Estimated Cycles:                     462|462                  (No change)
gungraun::bench::bench_new cache_line:(128, 128)
  Instructions:                         250|250                  (No change)
  L1 Hits:                              331|331                  (No change)
  LL Hits:                                0|0                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     338|338                  (No change)
  Estimated Cycles:                     576|576                  (No change)
gungraun::bench::bench_old page:(0x1000, 1)
  Instructions:                         283|283                  (No change)
  L1 Hits:                              380|380                  (No change)
  LL Hits:                                0|0                    (No change)
  RAM Hits:                               8|8                    (No change)
  Total read+write:                     388|388                  (No change)
  Estimated Cycles:                     660|660                  (No change)
  Comparison with bench_new page:(0x1000, 1)
  Instructions:                         238|283                  (-15.9011%) [-1.18908x]
  L1 Hits:                              322|380                  (-15.2632%) [-1.18012x]
  LL Hits:                                0|0                    (No change)
  RAM Hits:                               4|8                    (-50.0000%) [-2.00000x]
  Total read+write:                     326|388                  (-15.9794%) [-1.19018x]
  Estimated Cycles:                     462|660                  (-30.0000%) [-1.42857x]
gungraun::bench::bench_old cache_line:(128, 128)
  Instructions:                         295|295                  (No change)
  L1 Hits:                              388|388                  (No change)
  LL Hits:                                0|0                    (No change)
  RAM Hits:                              12|12                   (No change)
  Total read+write:                     400|400                  (No change)
  Estimated Cycles:                     808|808                  (No change)
  Comparison with bench_new cache_line:(128, 128)
  Instructions:                         250|295                  (-15.2542%) [-1.18000x]
  L1 Hits:                              331|388                  (-14.6907%) [-1.17221x]
  LL Hits:                                0|0                    (No change)
  RAM Hits:                               7|12                   (-41.6667%) [-1.71429x]
  Total read+write:                     338|400                  (-15.5000%) [-1.18343x]
  Estimated Cycles:                     576|808                  (-28.7129%) [-1.40278x]

Gungraun result: Ok. 4 without regressions; 0 regressed; 0 filtered; 4 benchmarks finished in 0.81081s

@mkroening mkroening requested a review from stlankes May 11, 2026 17:50
@mkroening mkroening self-assigned this May 11, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Details
Benchmark Current: 9d1304c Previous: 684b8f8 Performance Ratio
startup_benchmark Build Time 95.86 s 88.84 s 1.08
startup_benchmark File Size 0.75 MB 0.76 MB 0.99
Startup Time - 1 core 0.79 s (±0.03 s) 0.78 s (±0.03 s) 1.02
Startup Time - 2 cores 0.79 s (±0.03 s) 0.81 s (±0.03 s) 0.97
Startup Time - 4 cores 0.78 s (±0.03 s) 0.82 s (±0.03 s) 0.96
multithreaded_benchmark Build Time 96.40 s 90.83 s 1.06
multithreaded_benchmark File Size 0.86 MB 0.86 MB 0.99
Multithreaded Pi Efficiency - 2 Threads 69.22 % (±4.59 %) 91.02 % (±7.04 %) 0.76
Multithreaded Pi Efficiency - 4 Threads 41.90 % (±3.15 %) 45.15 % (±3.11 %) 0.93
Multithreaded Pi Efficiency - 8 Threads 19.71 % (±1.88 %) 25.97 % (±1.81 %) 0.76
micro_benchmarks Build Time 90.69 s 99.29 s 0.91
micro_benchmarks File Size 0.86 MB 0.87 MB 0.99
Scheduling time - 1 thread 69.30 ticks (±4.58 ticks) 72.54 ticks (±4.79 ticks) 0.96
Scheduling time - 2 threads 36.97 ticks (±3.53 ticks) 39.25 ticks (±4.34 ticks) 0.94
Micro - Time for syscall (getpid) 3.06 ticks (±0.24 ticks) 2.91 ticks (±0.25 ticks) 1.05
Memcpy speed - (built_in) block size 4096 80253.63 MByte/s (±55492.00 MByte/s) 74109.84 MByte/s (±51262.82 MByte/s) 1.08
Memcpy speed - (built_in) block size 1048576 29860.91 MByte/s (±24319.61 MByte/s) 30258.77 MByte/s (±24863.30 MByte/s) 0.99
Memcpy speed - (built_in) block size 16777216 28063.70 MByte/s (±23087.39 MByte/s) 25033.33 MByte/s (±20902.16 MByte/s) 1.12
Memset speed - (built_in) block size 4096 80058.40 MByte/s (±55351.88 MByte/s) 74010.75 MByte/s (±51195.39 MByte/s) 1.08
Memset speed - (built_in) block size 1048576 30607.79 MByte/s (±24767.72 MByte/s) 30998.30 MByte/s (±25278.82 MByte/s) 0.99
Memset speed - (built_in) block size 16777216 28823.09 MByte/s (±23534.61 MByte/s) 25782.40 MByte/s (±21381.84 MByte/s) 1.12
Memcpy speed - (rust) block size 4096 71074.86 MByte/s (±49769.85 MByte/s) 71023.68 MByte/s (±49725.32 MByte/s) 1.00
Memcpy speed - (rust) block size 1048576 29948.45 MByte/s (±24353.49 MByte/s) 29888.71 MByte/s (±24578.74 MByte/s) 1.00
Memcpy speed - (rust) block size 16777216 28306.15 MByte/s (±23327.48 MByte/s) 25343.61 MByte/s (±21099.87 MByte/s) 1.12
Memset speed - (rust) block size 4096 71301.13 MByte/s (±49928.11 MByte/s) 71748.34 MByte/s (±50221.83 MByte/s) 0.99
Memset speed - (rust) block size 1048576 30657.60 MByte/s (±24759.52 MByte/s) 30618.94 MByte/s (±24995.22 MByte/s) 1.00
Memset speed - (rust) block size 16777216 29076.48 MByte/s (±23775.26 MByte/s) 26088.88 MByte/s (±21572.54 MByte/s) 1.11
alloc_benchmarks Build Time 90.35 s 90.71 s 1.00
alloc_benchmarks File Size 0.83 MB 0.84 MB 0.99
Allocations - Allocation success 100.00 % 100.00 % 1
Allocations - Deallocation success 100.00 % 100.00 % 1
Allocations - Pre-fail Allocations 100.00 % 100.00 % 1
Allocations - Average Allocation time 7165.98 Ticks (±179.29 Ticks) 9085.13 Ticks (±127.02 Ticks) 0.79
Allocations - Average Allocation time (no fail) 7165.98 Ticks (±179.29 Ticks) 9085.13 Ticks (±127.02 Ticks) 0.79
Allocations - Average Deallocation time 1223.58 Ticks (±536.53 Ticks) 860.18 Ticks (±162.57 Ticks) 1.42
mutex_benchmark Build Time 90.17 s 92.18 s 0.98
mutex_benchmark File Size 0.86 MB 0.87 MB 0.99
Mutex Stress Test Average Time per Iteration - 1 Threads 13.04 ns (±0.80 ns) 13.08 ns (±0.89 ns) 1.00
Mutex Stress Test Average Time per Iteration - 2 Threads 15.64 ns (±9.88 ns) 15.24 ns (±8.38 ns) 1.03

This comment was automatically generated by workflow using github-action-benchmark.

@mkroening mkroening force-pushed the alloc-syscalls branch 2 times, most recently from 8edbe48 to 6ff2667 Compare May 11, 2026 19:02
@mkroening mkroening marked this pull request as ready for review May 11, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant